TSASNet: Tooth segmentation on dental panoramic X-ray images by Two-Stage Attention Segmentation Network
Yue Zhao1,2,    Pengcheng Li1,2,    Chenqiang Gao1,2,    Yang Liu3,
Qiaoyi Chen1,2,    Feng Yang1,2,    Deyu Meng4,5
1School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications,
2 Chongqing Key Laboratory of Signal and Information Processing,
3 Stomatological Hospital of Chongqing Medical University,
4Macau Institute of Systems Engineering, Macau University of Science and Technology,
5School of Mathematics and Statistics and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi’an Jiaotong University,
Knowledge-Based Systems (KBS), 2020
[PDF] [bibtex]
Abstract
Tooth segmentation acts as a crucial and fundamental role in dentistry for doctors to make diagnosis and treatment plans. In this paper, we propose a Two-Stage Attention Segmentation Network (TSASNet) on dental panoramic X-ray images to address the issues suffered in the tooth boundary and tooth root segmentation task which are caused by the low contrast and uneven intensity distribution. We firstly adopt an attention model which is embedded with global and local attention modules to roughly localize the tooth region in the first stage. Without any interactive operator, the attention model so constructed can automatically aggregate pixel-wise contextual information and identify coarse tooth boundaries. To better obtain final boundary information, we use a fully convolutional network as the second stage to further segment the real tooth area from the attention maps obtained from the first stage. The effectiveness of TSASNet is substantiated on the benchmark dataset containing 1,500 dental panoramic X-ray images, our proposed method achieves 96.94% of accuracy, 92.72% of dice and 93.77% of recall, significantly superior to the current state-of-the-art methods.
Network Overview
Fig.1 The proposed TSASNet pipeline involving two stages. The first stage adopts a pixel-wise contextual attention network to get attention map, while the second one segments the attention map to achieve the final segmentation results.
 
Results
We compare our proposed TSASNet with recent state-of-the-art learning and non-learning methods. The training and testing phases of our model are implemented on 4 NVIDIA GeForce GTX 1080Ti GPUs with PyTorch with an end-to-end fashion.
Result Gallery
Fig.2 Qualitative comparison results with the state-of-the-art image segmentation methods. The original images and the corresponding ground truth are shown in the first two rows. Rows 3 to 7 illustrate the segmentation results respectively derived from the U-Net, BiseNet, DenseASPP, SegNet and the proposed method. Red circle indicate the fine distinction between the ground truth, the SegNet and the proposed method.
Fig.3 Qualitative comparison results with the non-learning based methods. The original images and the corresponding ground truth are shown in the first two rows. Rows 3 to 6 illustrate the segmentation results respectively derived from Splitting and merging, Level set, Fuzzy C-Means and the proposed method.
Fig.4 Metrics comparison between 10 categories of the teeth situation. We calculated the metrics for each image category separately on the testing set. The green line denotes the median value of each metric, the red point denotes outlier.
 
©Pengcheng Li. Last update: 2022.01